NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction

https://doi.org/10.1016/j.gpb.2022.03.001

Zhu, Yi-Heng; Zhang, Chengxin; Liu, Yan; Omenn, Gilbert S.; Freddolino, Peter L.; Yu, Dong-Jun; Zhang, Yang (October 2022, Genomics, Proteomics & Bioinformatics)

Full Text Available
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

https://doi.org/10.1371/journal.pcbi.1008865

Li, Yang; Zhang, Chengxin; Bell, Eric W.; Zheng, Wei; Zhou, Xiaogen; Yu, Dong-Jun; Zhang, Yang (March 2021, PLOS Computational Biology)
Kolodny, Rachel (Ed.)
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top- L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top- L /5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.
more » « less
Full Text Available
Accurate multistage prediction of protein crystallization propensity using deep-cascade forest with sequence-based features

https://doi.org/10.1093/bib/bbaa076

Zhu, Yi-Heng; Hu, Jun; Ge, Fang; Li, Fuyi; Song, Jiangning; Zhang, Yang; Yu, Dong-Jun (May 2020, Briefings in Bioinformatics)

Abstract X-ray crystallography is the major approach for determining atomic-level protein structures. Because not all proteins can be easily crystallized, accurate prediction of protein crystallization propensity provides critical help in guiding experimental design and improving the success rate of X-ray crystallography experiments. This study has developed a new machine-learning-based pipeline that uses a newly developed deep-cascade forest (DCF) model with multiple types of sequence-based features to predict protein crystallization propensity. Based on the developed pipeline, two new protein crystallization propensity predictors, denoted as DCFCrystal and MDCFCrystal, have been implemented. DCFCrystal is a multistage predictor that can estimate the success propensities of the three individual steps (production of protein material, purification and production of crystals) in the protein crystallization process. MDCFCrystal is a single-stage predictor that aims to estimate the probability that a protein will pass through the entire crystallization process. Moreover, DCFCrystal is designed for general proteins, whereas MDCFCrystal is specially designed for membrane proteins, which are notoriously difficult to crystalize. DCFCrystal and MDCFCrystal were separately tested on two benchmark datasets consisting of 12 289 and 950 proteins, respectively, with known crystallization results from various experimental records. The experimental results demonstrated that DCFCrystal and MDCFCrystal increased the value of Matthew’s correlation coefficient by 199.7% and 77.8%, respectively, compared to the best of other state-of-the-art protein crystallization propensity predictors. Detailed analyses show that the major advantages of DCFCrystal and MDCFCrystal lie in the efficiency of the DCF model and the sensitivity of the sequence-based features used, especially the newly designed pseudo-predicted hybrid solvent accessibility (PsePHSA) feature, which improves crystallization recognition by incorporating sequence-order information with solvent accessibility of residues. Meanwhile, the new crystal-dataset constructions help to train the models with more comprehensive crystallization knowledge.
more » « less
Full Text Available
Atlas of dynamic spectra of fast radio burst FRB 20201124A

https://doi.org/10.1088/1674-1056/aca7ed

Wang, Bo-Jun; Xu, Heng; Jiang, Jin-Chen; Xu, Jiang-Wei; Niu, Jia-Rui; Chen, Ping; Lee, Ke-Jia; Zhang, Bing; Zhu, Wei-Wei; Dong, Su-Bo; et al (February 2023, Chinese Physics B)

Fast radio bursts (FRBs) are highly dispersed millisecond-duration radio bursts,^[1,2]of which the physical origin is still not fully understood. FRB 20201124A is one of the most actively repeating FRBs. In this paper, we present the collection of 1863 burst dynamic spectra of FRB 20201124A measured with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The current collection, taken from the observation during the FRB active phase from April to June 2021, is the largest burst sample detected for any FRB so far. The standard PSRFITs format is adopted, including dynamic spectra of the burst, and the time information of the dynamic spectra, in addition, mask files help readers to identify the pulse positions are also provided. The dataset is available in Science Data Bank, with the linkhttps://www.doi.org/10.57760/sciencedb.j00113.00076.
more » « less
Full Text Available

Search for: All records